Method and a system for determining the geometry and/or the localization of an object

ABSTRACT

A method for determining the geometry and/or the localisation of an object comprising the steps of:
         sending one or more signals by using one transmitter;   receiving by one or more receivers the transmitted signals and the echoes of the transmitted signals as reflected by one or more reflective surfaces   building by a computing module a first Euclidean Distance Matrix (EDM) comprising the mutual positions of the receivers;   adding to the EDM matrix a new row and a new column, the new row and a new column comprising time of arrivals of said echoes and computing its rank or distance to an EDM matrix   determining the geometry and/or the position of the object based on said rank or distance.

RELATED APPLICATION

The present application claims the priority of the Swiss patentapplication CH2935/12 of Dec. 22, 2012, the content of which is herebyincorporated by reference.

FIELD OF THE INVENTION

The present invention concerns a method and a system for determining thegeometry and/or the localisation of an object, e.g. of a wall, a room, amicrophone, a loudspeaker or a person. The invention concerns inparticular the estimation of the geometry of a room from its acousticroom impulse responses (RIR).

DESCRIPTION OF RELATED ART

The problem of estimating the geometry of a room from its acoustic roomimpulse responses (RIR) can be resumed by a question: can a personblindfolded inside a room hear the shape of the room after havingsnapped his fingers? In other words can the person reconstruct the 2-Dor 3-D geometry of the room from the acoustic room impulse response(RIR)?

Beyond the question of uniqueness, meaning that the RIR is a uniquesignature of a room, the question of reconstructing the geometry fromimpulse responses is interesting algorithmically. That is, are thereefficient ways to recover the room geometry from measured impulseresponses?

Finally, establishing uniqueness would lead to localization inside aknown (or unknown) room and algorithms for tracking the trajectory of amoving source listening to the varying RIRs. Key questions are: how manysources, how many receivers, for what room shapes?

Different known documents have tried to give some responses to thequestion above. Moreover recently, there has been a renewed interest inreconstructing the room shape from acoustic response, as shown by theincreasing number of publications on the subject.

Some of these documents have used the image source model in order tocope with the signal reflections. This image source model, along withthe first and second order echoes, are described in FIGS. 1 and 2.

FIG. 1 illustrates a room defined by the walls w1, w2 and by other wallsnot represented and comprising a source or transmitter s and a receiverr. The source can be for example and in a non limitative way aloudspeaker and the receiver a microphone. The walls are reflectivesurface, i.e. a surface allowing a signal to be reflected, the angle atwhich the signal is incident on this surface being equal to the angle atwhich it is reflected.

A first audio signal transmitted by the source s is reflected by thewall w2. The reflected signal or echo e1 is then received by thereceiver r. Since there is a single reflection of the transmitted signalbefore its reception by the receiver r, the echo e1 is a first-orderecho. A second audio signal transmitted by the source is reflected firstby the wall w2 and after by the wall w2: the reflected signal or echo e2is then received by the receiver r. Since there are two reflections ofthe transmitted signal before its reception by the receiver r, the echoe2 is a second-order echo.

The times of arrival (TOA) is defined as the travel time from a source sto a receiver r. The audio signals e1 and e2 can have different time ofarrivals (TOAs).

FIG. 2 illustrates a system comprising a room defined by some walls (forsake of clarity only three walls are represented), a source ortransmitter s and a receiver r. The points p_(i) and p_(i+1) are theend-points of the i_(th)-wall, n_(i) is its unit, outward pointingnormal and {tilde over (s)}_(i) is an image source: in fact the signale_(i) received by the receiver r could be considered as generated by theimage or virtual source {tilde over (s)}_(i) which is the mirror imageof the source s with respect to the wall defined by the points p_(i) andp_(i+1). {tilde over (s)}_(i) is a first generation image source as thesignal e_(i) received by the receiver r has been reflected once by thewall. In other words {tilde over (s)}_(i) is a first generation imagesource as e_(i) is a first-order echo.

{tilde over (s)}_(ij) is the image of {tilde over (s)}_(i) with respectof the wall (i+1). It is then a second generation image source,generating a second-order echo.

The virtual sources {tilde over (s)}_(i) or {tilde over (s)}_(ij) arenot real, tangible and concrete sources as the “real” source s. In otherwords they are abstract objects used for studying the signalreflections, according to the well known image-source theory, used e.g.in optics.

The use of the reflections of a signal for the determination of theposition of the real source and/or of the shape of a room is known fromUS2011317522. However the described algorithm does not propose to findthe source location immediately as there is a huge number ofintermediated steps and hypothesis.

In U.S. Pat. No. 7,688,678 the volume of a room is determined by usingthe diffused field, i.e. without image sources.

In GEOMETRICALLY CONSTRAINED ROOM MODELING WITH COMPACT MICROPHONEARRAYS, F. RIBEIRO, D. A. FLORENCIO, D. E. BA, AND C. ZHANG, IEEETRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 20, NO.5, PP. 1449-1460 2012, it is necessary to know in advance the mutualposition of the microphones and of the loudspeaker. Moreover since manyimpulse responses have to be measured by putting a fake wall atdifferent positions with respect to the microphone array and theloudspeaker, the resulting matrix of shifted impulse responses is alsoquite huge and then computing expensive.

In INFERENCE OF ROOM GEOMETRY FROM ACOUSTIC IMPULSE RESPONSES ANTONACCI,FILOS, THOMAS, HABETS, SARTI, NAYLOR, TUBARO TO APPEAR ON IEEETRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2012, only the 2Dgeometry of a room is estimated, i.e. there are not estimations of thefloor and the ceiling. In this case the discretized Hough transform isused. Moreover the described algorithm requires that the source has tobe placed in many different positions.

In F. Antonacci, A. Sarti, and S. Tubaro, “Geometric reconstruction ofthe environment from its response to multiple acoustic emissions” inProceedings of the IEEE International Conference on Acoustics, Speech,and Signal Processing, Dallas, 2010, pp. 2822-2825, the authors proposeto move a loudspeaker around a microphone to collect multiple impulseresponses and then estimate the distance and the angle of the reflector(a line since they consider a 2-D case) using the tools of projectivegeometry. Each source-receiver pair defines an ellipse of possiblereflection points, and the wall is estimated as the common tangents toall of the ellipses.

M. Kuster, D. de Vries, E. M. Hulsebos, and A. Gisolf, “Acoustic imagingin enclosed spaces: Analysis of room geometry modifications on theimpulse response”, Journal of the Acoustical Society of America, vol.116, no. 4, pp. 2126-2137, 2004, describes an approach based on acousticimaging is proposed. An array comprising many microphones is used tosample the sound field and then employ wave field inversion to infer theroom.

J. Fibs and E. A. P. Habets, “A two-step approach to blindly infer roomgeometries”, in Proceedings of the International Workshop on AcousticEcho and Noise Control, 2010, propose to use projective geometry toolsto infer the room geometry.

S. Tervo, “Localization and tracing of early acoustic reflections”,Ph.D. thesis, Aalto, University, School of Science, Department of MediaTechnology, 2012, describes a method using directive loudspeakers, andthen scanning the room for reflectors. The proposed method requiresmultiple emissions for the room to be scanned completely.

Some inventors of the present invention have previously worked on aproblem of estimating the room geometry from a single RIR in I.Dokmanic, Y. M. Lu, and M. Vetterli, “Can One Hear the Shape of a Room:The 2-D Polygonal Case”, in Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, Prague, 2011,EPFL. The described algorithm is based on the complete knowledge offirst and second generation echo Times of Arrivals (TOAs): however thesecond generation is often difficult to obtain for practical reasons(e.g. attenuation of the signal). Then the proposed algorithm is notapplicable in practice since without using second-order echoes a singleRIR does not suffice for reconstructing the shape of the room.

The known solutions are then not often applicable in practice. They arenot exact, since some approximations are necessary (as for example inthe Hough transform case). Some of them allow to reconstruct the 2Dgeometry of a room only, without considering ceiling and floor. Theyrequire also a huge number of receivers and/or transmitters.

It is an aim of the present invention to obviate or mitigate one or moreof the aforementioned disadvantages.

BRIEF SUMMARY OF THE INVENTION

According to the invention, these aims are achieved by means of a methodfor determining the geometry and/or the localisation of an objectaccording to claim 1, a system for determining the geometry and/or thelocalisation of an object according to claim 15, a computer programproduct determining the geometry and/or the localisation of an objectaccording to claim 19.

The method according to the invention comprises the steps of

sending one or more signals by using one transmitter

receiving by one or more receivers the transmitted signals and theechoes of the transmitted signals as reflected by one or more reflectivesurfaces

building by a computing module a first Euclidean Distance Matrix (EDM)comprising the mutual positions of the receivers;

adding to the Euclidean Distance Matrix a new row and a new column, thenew row and a new column comprising time of arrivals of said echoes andcomputing the rank of the modified matrix, or by computing how far themodified matrix is from a true Euclidean Distance Matrix;

determining the geometry and/or the position of the object based on thecomputed information.

The first EDM (Euclidean Distance Matrix) corresponds to the receiverssetup, which is known. For example given some receivers r_(i), the EDMmatrix DεR^(M×M) comprising the following elements:

d _(ij) =∥r _(i) −r _(j)∥₂ ²1≦i,j≦M

where ∥•∥₂ ² is an Euclidean distance.

The EDM matrix is then a symmetric matrix with positive entries and azero diagonal.

Advantageously the proposed method performs an echo labelling. In fact,in order to know which of the peaks in impulse responses received by thereceivers (e.g. microphones) correspond to which reflective surface(e.g. wall of a room), instead of relying on different derivedheuristics, intrinsic properties of point sets in Euclidean spaces areused. A particular property easily exploited is the rank property ofEDM, which says that the EDM corresponding to a point set in R^(n) hasthe rank at most n+2. In 2-D, its rank can thus be at most 4, and in 3-Dat most 5.

The matrix D is augmented with a combination of M TOAs. This correspondsto adding a new row and a new column to D. If the augmented matrixD_(aug), still verifies the rank property (or more generally, the EDMproperty), then the selected combination of echoes corresponds to animage source, or equivalently, to a reflective surface (e.g. a wall).

Even if this requires to test all the echoes combinations, in practicalcases the number of combinations is quite small and does not represent aproblem: e.g. with M=4, only 256 combinations have to be tested.Moreover there are not many correct combinations, but only one.

The number of combinations may even be smaller, by choosing to combine aparticular echo received by one microphone only with those echoes fromother microphones that were received within a temporal windowcorresponding to the size of the microphone setup.

The advantage of use the EDM approach is that it is exact, notapproximate (like e.g. the discrete Hough transform). It is then aclear-cut criterion for good combinations of echoes.

It can be applied for many signals (acoustic signals, radio signals, UWBsignals, etc.). It is very general and can be extended to multiplesources, multiple microphones very easily (as will be discussed herebelow, it can be applied to MIMO applications).

It requires only one source or transmitter. It can work with a smallnumber of receivers, i.e. less than 5.

It can be used for determining a 3D geometry of a room.

In one preferred embodiment the method considers first-order echoesonly; other echoes are not considered, and may be discarded. Thereforethe method does not rely on a knowledge of second-order andfurther-order echoes, which are difficult to measure.

In one preferred embodiment the object is a convex room, the transmitteris a loudspeaker, each receiver is a microphone, the geometry is a 2Dgeometry and the number of receivers is 3. In other words the proposedmethod allows to determine the 2D geometry of a room by using oneloudspeaker and only 3 microphones. So the proposed method uses areduced number of receivers for accurately determining the room's 2Dgeometry.

The proposed method based on EDM can be extended to determine the 3Dgeometry of a room, using at least 5 receivers.

The method according to the invention can comprise the determination ofthe location of the transmitter by using least-squared distancetrilateration. It can comprise multi-dimensional scaling. It cancomprise applying a s-stress criterion.

The present invention concerns also a system for determining thegeometry and/or the localisation of an object comprising

a transmitter for sending one or more signals;

one or more receivers for receiving the transmitted signals and theechoes of the transmitted signals as reflected by one or more reflectivesurfaces;

a first computing module for building a first Euclidean Distance Matrix(EDM) comprising the mutual positions of the receivers, and optionallycomputing its rank;

a second computing module for adding to the EDM a new row and a newcolumn, the new row and a new column comprising time of arrivals of saidechoes and computing its rank or its distance from the first EDM;

a third computing module for determining the geometry and/or theposition of the object based on said rank or distance.

In one preferred embodiment the first, second and third modules are thesame module.

The present invention concerns also a computer program product fordetermining the geometry and/or the localisation of an object,comprising:

a tangible computer usable medium including computer usable program codebeing used for

building a first Euclidean Distance Matrix (EDM) comprising the mutualpositions of the receivers and optionally computing its rank;

adding to the EDM a new row and a new column, the new row and a newcolumn comprising time of arrivals of echoes of the signals transmittedby a transmitter as reflected by one or more reflective surfaces andreceived by one or more receivers and computing its rank or its distancefrom the first EDM or set of EDMs;

determining the geometry and/or the position of the object by comparingthe first rank based on the second rank and/or on said distance.

The present invention concerns also a computer data carrier storingpresentation content created with the described method.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be better understood with the aid of the descriptionof an embodiment given by way of example and illustrated by the figures,in which:

FIGS. 1 and 2 show a view of a room comprising a source or transmitterand a receiver.

FIG. 3 shows a view of a room comprising a source or transmitter andfour receivers.

FIG. 4A to 4C show the RIR received from each receiver.

FIG. 5 illustrates some possible room's reconstructions due to incorrectecho labelling.

FIG. 6 illustrates the feasible region concept.

FIG. 7 illustrates an embodiment of a system according to the invention.

FIG. 8 illustrates an embodiment of a data processing system in which amethod in accordance with an embodiment of the present invention, may beimplemented.

DETAILED DESCRIPTION OF POSSIBLE EMBODIMENTS OF THE INVENTION

The present invention will be now described in more detail in connectionwith its embodiment for determining the geometry of a room. However thepresent invention finds applicability of connection with many otherfields, as will be discussed. Moreover the two-dimensional case will bedescribed first for the sake of simplicity and illustrations. The threedimensional case then follows easily.

The method according to the invention uses the image source model. Theidea in the image source model is that if there is a sound source on oneside of the wall, then the sound field on the same side can berepresented as a superposition of the original sound field and the onegenerated by a mirror image of the source with respect to the wall.

FIG. 2 illustrates the setup and the image source model. For ourpurposes, a room is either a convex planar K-polygon or a K-faced convexpolyhedron. With the i_(th) side of the room we associate an outwardpointing unit normal n_(i), and define the normal matrix as N=(n₁, . . .n_(k)). The reference {tilde over (s)}_(i) denotes the image of thesource s with respect to the i side.

In the case of FIG. 2 the following relation is valid

{tilde over (s)} _(i) =s+2

p _(i) +s,n _(i)

n _(i)  (1)

In the remainder of this report we assume that the choice of units issuch that the speed of sound is unity, c=1. Adjustments for the actualspeed of sound are trivial.

By observing the impulse response and doing appropriate computations itis possible to access to the first-order echos, but also higher-orderechoes.

FIG. 3 illustrates array of M microphones in a 2-D room (in this casethen M=4), and a loudspeaker s at an arbitrary position in this room.The geometry enables the microphones to pick up the first-order echoesonly.

The references r1 to r4 denotes the receivers along with theirpositions. Same considerations apply to the source.

In general r_(m)εR² and sεR².

The EDM matrix is the Euclidean Distance Matrix corresponding to themicrophones setup, which is known. In the case of FIG. 3, the EDM matrixDεR^(M×M) comprising the following elements:

d _(ij) =∥r _(i) −r _(j)∥₂ ²1≦i,j≦M

where ∥•∥₂ ² is an Euclidean distance.

The EDM matrix is then a symmetric matrix with positive entries and azero diagonal.

If the loudspeaker s fires a pulse, each microphone (it is assumed thatall of them are in favourable positions so that they observe echoes forall the walls) will receive the direct sound and K first-order echoes.These echoes correspond to images of s across the K walls. The locationsof image sources are valid points of the plane R².

If the distances between the image sources and the microphones areknown, it is possible to reconstruct the locations of image sources andhence the 2-D room.

In order to know which of the peaks Pi (see FIG. 4A to 4C) in impulseresponses RIR_(i) received by the microphones correspond to which wall(labeling problem) EDM is used. Instead of relying on different derivedheuristics, intrinsic properties of point sets in Euclidean spaces areused. A particular property easily exploited is the rank property ofEDM, which says that the EDM corresponding to a point set in R^(n) hasthe rank at most n+2. In 2-D, its rank can thus be at most 4, and in 3-Dat most 5.

The matrix D is augmented with a combination of M TOAs. This correspondsto adding a new row and a new column to D. If the augmented matrixD_(aug) still verifies the rank property (or more generally, the EDMproperty), then the selected combination of echoes corresponds to animage source, or equivalently, to a reflective surface (e.g. a wall).

FIG. 5 illustrated some possible room reconstructions due to incorrectecho labeling, of which only a single one (reference 10 in the Figure)satisfies the EDM criterion. The image source location is estimatedusing least-squared-distance trilateration.

Characterisation of Correct TOA Vectors

Denote by τ_(m) the set of first-order echo TOAs received by the m-thmicrophone. The matrix D is now augmented by a vector t so that

$\begin{matrix}{D_{aug} = \begin{pmatrix}D & t \\t^{T} & 0\end{pmatrix}} & (2)\end{matrix}$

where the vector t is formed by taking one TOA from each microphone. Inparticular t=(t₁ ², . . . , t_(M) ²)^(T), with t_(m)ετ_(m). It ispossible to state the following lemma:

Lemma 1.

If an M-tuple of echoes t={t₁, . . . , t_(M)} is such that rankD_(aug)<5, then (t^(T),0)^(T)ε

{(D,t)^(T)}. In particular, if M=4 and the microphones are not colinearor on a circle, we have that t^(T)D⁻¹t=0.

Proof.

First part is obvious. For the second part, let t be a vector thatcorresponds to the fourtuple such that rank (D_(aug))<4. It is possibleto write

$\begin{matrix}{D_{aug} = \begin{pmatrix}D & t \\t^{T} & 0\end{pmatrix}} & (3)\end{matrix}$

If the rank of this matrix is 4 or less, it is possible to represent thelast column as a linear combination of the first four columns. This inturn means that ∃v such that

$\begin{matrix}{{\begin{pmatrix}D \\t^{T}\end{pmatrix}v} = \begin{pmatrix}t \\0\end{pmatrix}} & (4)\end{matrix}$

By components, one has

Dv=t

t ^(T) v=0.  (5)

Under the assumptions of the second part, D is invertible, and combiningthe two equations yields the result.

Corollary 1.

Let

$Z = {\left\{ {t \in {{R^{M}\text{:}\mspace{14mu} {{rank}\begin{pmatrix}0 & t \\t^{T} & 0\end{pmatrix}}} < 5}} \right\} \Subset {R^{M}.}}$

Then dim Z<M, that is, μ(Z)=0, where μ is the Lebesgue measure in R^(M).

Proof.

Immediate from Lemma 1.

This means that it is possible to test all possible M-tuples generatedfrom the collected RIRs and know that that one that yield singularD_(aug) correspond to image sources. With 4 walls and 4 microphones, wehave 4⁴=256 combinations. This amounts to 256 SVDs of a 5×5 matrix,which can be computed very fast, so the combinatorial aspect is not anissue. After finding the corresponding rows it is possible totriangulate to find the actual locations of image sources, and fromthere find the walls.

The described procedure is summarized in Algorithm 1.

Algorithm 1 NOISELESS ROOM RECOVERY Input: Times of Arrival T₁, . . . ,T_(M) Output: Room walls 1: for every {square root over (t)} ∈ T₁ × . .. × T_(M) do 2:  ${{{Build}\mspace{14mu} {the}\mspace{14mu} {matrix}\mspace{14mu} D_{aug}} = \begin{pmatrix}D & t \\t^{T} & 0\end{pmatrix}},$ 3:  if rank D_(aug) ≦ 4, or equivalently, [t^(T),0]^(T) ∈  

 ([D, t]^(T)) then 4:   Triangulate the location of image sourcecorresponding to t, 5:   Compute the wall normal as the vector from theloudspeaker to the image source, 6:   Compute the distance of the wallfrom the loudspeaker. 7:  end if 8: end for 9: Reconstruct the convexroom using the collected information.

Three-Dimensional Case

In the three-dimensional case, at least 5 microphones are needed toapply the EDM method (see below for a method that enables to use 4).Only slight adjustments are needed that reflect the change of theambient dimension. In fact, it is possible to immediately apply theAlgorithm 1, but instead of testing whether rank D_(aug)≦4, one have totest whether rank D_(aug)≦5.

Uniqueness

The goal is to show that the probability for the described algorithm tofail is 0. To this end, it is defined a set of “good” rooms in which thealgorithm can be applied, and then prove two theorems about theuniqueness of the solution. Since the algorithms rely in the knowledgeof the first-order TOAs, it is required that the microphones hear them.This defines a “good” room, which is in fact a combination of a roomgeometry and the microphone array/loudspeaker location.

Definition 1 (Feasibility).

Given a room R and a loudspeaker position s, the point xεR is feasibleif a microphone placed at x receives all the first-order echoes of apulse emitted from s. The interior of the set of all feasible points iscalled a feasible region.

FIG. 6 illustrates the concept of a feasible region. With thisdefinition it is possible to state the first uniqueness result.

Theorem 1.

Assume we are given a room and a source location. Assume further thatthe room-loudspeaker combination generates a non-empty feasible regionand that the microphones are placed uniformly at random in the feasibleregion. Then with probability 1, there is only one room corresponding tothe collected RIRs and it can be retrieved by the Algorithm 1.

Sketch of proof. Fix any configuration of microphones (r₁, . . . ,r_(M)) such that all r_(m) are in the feasible region. This microphoneconfiguration includes an M-tuple of first-order TOAs, t₀(t₁, . . . ,t_(M))^(T). Now since the feasible region is open, there is some ε=ε(r₁,. . . , r_(M))>0 such that we can achieve any tεB_(ε)(t₀) by adjustingthe microphone positions. To see this, one can observe that it ispossible to adjust each t_(m) independently of others by moving thecorresponding microphone.

Since this is true of any t₀ one might generate, it follows that thespace of possible TOA combinations is the union of all such open balls,and thus M-dimensional. By Corollary 1, the dimension of the set of theM-tuples t that pass the EDM test is smaller than M. But μ(A)=0 ifdim(A)<M, where μ is the Lebesgue measure in R^(M). It is possible tonote that the probability distribution introduced on ts is non singularsince the mapping

t is continuous and the Jacobian of the mapping is non-zero, so theclaim follows. Alternatively by the same token it is possible to notethat the measure of all Rs that give viable ts is zero, and directlyconclude.

Remark:

A good way to think about this is that one can draw K^(M) samples fromthe non-singular (continuous) probability distribution on the set ofM-tuples t. By definition of the continuous probability distribution,the probability to draw a sample from a set with Lebesgue measure 0 mustbe 0 itself. It might appear surprising that even if the probability tonail the correct M-tuple is zero, one always has K correct ones. This iseasy to explain by noting that the echoes corresponding to one singlewall are not independent, but they are independent of the other echoes.

Theorem 2.

Assume you are given a fixed microphone array and a loudspeakerposition. A room is generated at random in such a manner that the arrayis in the feasible region. Then with probability 1, there is only oneroom corresponding to the collected RIRs and it can be retrieved by theAlgorithm 1.

The meaning of these theorems is essentially that in whatever room oneruns the algorithm so that the microphones are in the feasible region,the solution is unique.

A Subspace Approach

The approach described in the previous section requires at least fourmicrophones in the 2-D case, and five microphones in the 3-D case. Nowit is described another approach that works with a minimal number ofmicrophones (minimal in the sense that one cannot use less by exploitingonly the first order TOA information).

It is possible to always choose the origin of the coordinate system sothat one has

$\begin{matrix}{{\sum\limits_{m = 1}^{M}r_{m}} = 0} & (6)\end{matrix}$

with r_(m)=(r_(m) ^(x), r_(m) ^(y))^(T). Let {tilde over (s)}_(k) be thelocation vector of one image source (with respect to the wall k). Then,up to a possible permutation, one receives at each microphone thesquared distance information,

$\begin{matrix}{y_{k,m}\overset{def}{=}{{\langle{{{\overset{\sim}{s}}_{k} - r_{m}},{{\overset{\sim}{s}}_{k} - r_{m}}}\rangle} = {{{\overset{\sim}{s}}_{k}}^{2} - {2{\langle{{\overset{\sim}{s}}_{k},r_{m}}\rangle}} + {{r_{m}}^{2}.}}}} & (7)\end{matrix}$

Define further

${\overset{\sim}{y}}_{k,m}\overset{def}{=}{{{- \frac{\; 1}{2}}\left( {y_{k,m} - {r_{m}}^{2}} \right)} = {{\langle{r_{m},{\overset{\sim}{s}}_{k}}\rangle} - {\frac{1}{2}{{\overset{\sim}{s}}_{k}}^{2}}}}$

We have in vector form

$\begin{matrix}{\begin{pmatrix}{\overset{\sim}{y}}_{k,1} \\{\overset{\sim}{y}}_{k,2} \\\vdots \\{\overset{\sim}{y}}_{k,M}\end{pmatrix} = {\begin{pmatrix}r_{1}^{T} & {- \frac{1}{2}} \\r_{2}^{T} & {- \frac{1}{2}} \\\vdots & \vdots \\r_{M}^{T} & {- \frac{1}{2}}\end{pmatrix}\begin{pmatrix}{\overset{\sim}{s}}_{k} \\{{\overset{\sim}{s}}_{k}}^{2}\end{pmatrix}}} & (8)\end{matrix}$

Demote by M the above matrix,

$\begin{matrix}{M\overset{def}{=}\begin{pmatrix}r_{1}^{T} & {- \frac{1}{2}} \\r_{2}^{T} & {- \frac{1}{2}} \\\vdots & \vdots \\r_{M}^{T} & {- \frac{1}{2}}\end{pmatrix}} & (9)\end{matrix}$

and set

${{\overset{\sim}{y}}_{k}\overset{def}{=}\left( {{\overset{\sim}{y}}_{k,1},\ldots \mspace{14mu},{\overset{\sim}{y}}_{k,M}} \right)^{T}},{{\overset{\sim}{u}}_{k}\overset{def}{=}{\left( {s_{k}^{x},s_{k}^{y},{{\overset{\sim}{s}}_{k}}^{2}} \right)^{T}.}}$

We write the above expression (8) co{tilde over (y)}_(k)=Mũ_(k∃).

Thanks to the condition that

${{\sum\limits_{m = 1}^{M}r_{m}} = 0},$

we have that

$\begin{matrix}{{{1^{T}{\overset{\sim}{y}}_{k}} = {{- \frac{M}{2}}{{\overset{\sim}{s}}_{k}}^{2}}}{i.e.}} & (10) \\{{{\overset{\sim}{s}}_{k}}^{2} = {{- \frac{2}{M}}{\sum\limits_{m = 1}^{M}{{\overset{\sim}{y}}_{k,m}.}}}} & (11)\end{matrix}$

Furthermore,

{tilde over (s)} _(k) =A{tilde over (y)} _(k),  (12)

where A is a matrix such that

$\begin{matrix}{{AM} = {\begin{pmatrix}1 & 0 & 0 \\0 & 1 & 0\end{pmatrix}.}} & (13)\end{matrix}$

These two conditions provide a complete characterisation of the distanceinformation. In practice, it is sufficient to verify the linearconstraint

{tilde over (y)} _(k)ε

(M),  (14)

where

(M) is a proper subspace when M≧4.

This approach enables to formulate equivalent theorems and algorithms tothe ones for EDM formulation, with analogous argumentations. But morethan that, it is possible to use the nonlinear condition (11) to solvethe problem with only 3 microphones in the 2-D and 4 microphones in 3-D.

Theorem 3.

The minimal number of microphones required to hear the room given thatthey observe the first order echoes is 3 in 2-D and 4 in 3-D.

Proof.

Construct a family of counterexamples for M=2.

Practical Considerations—Working with Uncertainties

In practice one encounters several sources of error. The first errorterm comes from the uncertainty when measuring the inter-microphonedistances, that is

d _(ij)=

d _(ij)

+e _(ij),  (15)

so that

D=

D

+E,  (16)

where E is a symmetric, zero-diagonal error matrix.

This can be dealt with the calibration, but note that the schemesproposed in the following seem to be very stable with respect touncertainties in array calibration.

The second source of error comes from the effects of the finite samplingrate and the finite precision of peak-picking algorithms. Some of thiscan be alleviated by using a high sampling rate, and bettertime-of-arrival estimation algorithms.

However it is better to use some kind of a distance measure between themeasured/assembled D_(aug) and some feasible D_(aug). One possibleapproach would be to build a heuristic based on the singular values ofD_(aug). Such approach, however, would capture only the rank requirementon the matrix. But the requirement that D_(aug) be an EDM brings in manymore subtle dependencies between its elements. For instance one has that

$\begin{matrix}{{\left( {I - {\frac{1}{n}11^{T}}} \right){D_{aug}\left( {I - {\frac{1}{n}11^{T}}} \right)}}0.} & (17)\end{matrix}$

Furthermore (17) does not allow to specify the ambient dimension of thepoint set. Imposing this constraint leads to even more dependenciesbetween the matrix elements, and the resulting space of matrices is nolonger a cone (it is actually not anymore convex). Nevertheless, it ispossible to use a family of algorithms known as multidimensional scaling(MDS) to find the closest EDM between the points in a fixed ambientdimension.

Multidimensional Scaling

As pointed out, in the presence of noise it is not favourable to use therank test on D_(aug). A very good way (as verified through simulations)to deal with this nuisance is to measure how close D_(aug) is to a trueEDM. In order to measure the distance, it is possible to useMultidimensional Scaling to construct a point set in a given dimension(either 2-D or 3-D) which produces the EDM “closest” to D_(aug).

Multidimensional Scaling (MDS) was originally proposed in psychometricsas a method for data visualization. Many variations have been proposedto adapt the method for sensor localization.

Here it is used the s-stress criterion as proposed by Takane, Young andde Leeuw (1977). Given an observed noisy matrix {tilde over (D)}, thes-stress criterion is

${s\left( \overset{\sim}{D} \right)} = {{minimize}{\sum\limits_{i,j}\left( {d_{i,j}^{2} - {\overset{\sim}{d}}_{i,j}^{2}} \right)^{2}}}$subject  to  D ∈ ².

We call s({tilde over (D)}) the score of matrix {tilde over (D)}. ByEDM² we denote the set of EDMs with embedding dimension 2 (produced bypoint sets in 2-D). In the 3-D case, EDM² is replaced by EDM³.

From now on, it is assumed that the target space is R². The 3-Dadaptation is immediate. If one associates to each point in R² acoordinate vector x_(i)=(x_(i),y_(i))^(T), one has that d²_(i,j)=∥x_(i)−x_(j)∥₂ ²=(x_(i)−x_(j))²+(y_(i)−y_(j))².

Thus, the s-stress criterion can be rephrased as

$\begin{matrix}{{s\left( \overset{\sim}{D} \right)} = {\underset{x_{i},{y_{i} \in {\mathbb{R}}}}{minimize}{\sum\limits_{i,j}\left\lbrack {\left( {x_{i} - x_{j}} \right)^{2} + \left( {y_{i} - y_{i}} \right)^{2} - {\overset{\sim}{d}}_{i,j}^{2}} \right\rbrack^{2}}}} & (18)\end{matrix}$

The objective function in (18) is not convex. However, it has been shownto have less local minima compared to other MDS criteria. Furthermore,it yields a meaningful definition of the distance of a matrix from anoptimal EDM.

In order to further skip the local minima of (18), it is possible to usecoordinate alternation for finding the optimal EDM: it is possible tocompute (18), by first minimizing over x_(i) and then over y_(i).Although this approach is suboptimal compared to simultaneousminimization with respect to x_(i), it leads to simpler computations.

Assuming that x_(i) has to be updated by Δx_(i) to give the minimum ofs({tilde over (D)}), one will have

$\begin{matrix}{{{s\left( \overset{\sim}{D} \right)}_{i}^{({k + 1})} = {\overset{n}{\sum\limits_{j = 1}}\left\lbrack {\left( {x_{i}^{(k)} + {\Delta \; x_{i}^{({k + 1})}} - x_{j}^{(k)}} \right)^{2} + \left( {y_{i}^{(k)} - y_{j}^{(k)}} \right)^{2} - {\overset{\sim}{d}}_{i,j}^{2}} \right\rbrack^{2}}},} & (19)\end{matrix}$

where (•)^((k)) returns the value at iteration k. Taking the derivativeof s({tilde over (D)})_(i) ^((k+1)) with respect to Δx_(i) ^((k+1)), onewill have

$\begin{matrix}{\frac{\partial{s\left( \overset{\sim}{D} \right)}_{i}^{({k + 1})}}{{\partial\Delta}\; x_{i}^{({k + 1})}} = {{4{n\left( {\Delta \; x_{i}^{({k + 1})}} \right)}^{3}} + {3{\sum\limits_{j = 1}^{n}{\left( {x_{i}^{(k)} - x_{j}^{(k)}} \right)\left( {\Delta \; x_{i}^{({k + 1})}} \right)^{2}}}} + {\sum\limits_{j = 1}^{n}{\left\lbrack {{3\left( {x_{i}^{(k)} - x_{j}^{(k)}} \right)^{2}} + \left( {y_{i}^{(k)} - y_{j}^{(k)}} \right)^{2} - {\overset{\sim}{d}}_{i,j}^{2}} \right\rbrack \Delta \; x_{i}^{({k + 1})}}} + {\sum\limits_{j = 1}^{n}{\left\lbrack {\left( {x_{i}^{(k)} - x_{j}^{(k)}} \right)^{3} + {\left( {x_{i}^{(k)} - x_{j}^{(k)}} \right)\left( {y_{i}^{(k)} - y_{j}^{(k)}} \right)^{2}} - {\left( {x_{i}^{(k)} - x_{j}^{(k)}} \right){\overset{\sim}{d}}_{i,j}^{2}}} \right\rbrack.}}}} & (20)\end{matrix}$

Setting (20) to zero yields at most real solutions, and comparing thevalue of s({tilde over (D)})_(i) ^((k+1)) for the results gives theoptimal value for Δx_(i) ^((k+1)).

The complete optimization procedure is summarized in Algorithm 2.

Algorithm 2 COORDINATE ALTERNATION FOR S-STRESS OPTIMIZATION Input:Symmetric and zero-diagonal matrix {tilde over (D)} Output: Estimatepositions: x and s({tilde over (D)}) 1: Assume an initial configurationfor the points x⁰ 2: repeat 3:  for i = 1 to n do 4:   Assume theconfiguration of the points different than i fixed, 5:   Update x_(i)using the i^(th) row of {tilde over (D)}, 6:   Update y_(i) using thei^(th) row of {tilde over (D)}, 7:  end for 8: until convergence ormaximum number of iterations is reached.

FIG. 8 is an embodiment of a data processing system 300 in which anembodiment of a method of the present invention may be implemented. Thedata processing system 300 of FIG. 8 may be located and/or otherwiseoperate at any node of a computer network, that may exemplarily compriseclients, servers, etc., and it is not illustrated in the Figure. In theembodiment illustrated in FIG. 8, data processing system 300 includescommunications fabric 302, which provides communications betweenprocessor unit 304, memory 306, persistent storage 308, communicationsunit 310, input/output (I/O) unit 312, and display 314.

Processor unit 304 serves to execute instructions for software that maybe loaded into memory 306. Processor unit 304 may be a set of one ormore processors or may be a multi-processor core, depending on theparticular implementation. Further, processor unit 304 may beimplemented using one or more heterogeneous processor systems in which amain processor is present with secondary processors on a single chip. Asanother illustrative example, the processor unit 304 may be a symmetricmulti-processor system containing multiple processors of the same type.

In some embodiments, the memory 306 shown in FIG. 8 may be a randomaccess memory or any other suitable volatile or non-volatile storagedevice. The persistent storage 308 may take various forms depending onthe particular implementation. For example, the persistent storage 308may contain one or more components or devices. The persistent storage308 may be a hard drive, a flash memory, a rewritable optical disk, arewritable magnetic tape, or some combination of the above. The mediaused by the persistent storage 308 also may be removable such as, butnot limited to, a removable hard drive.

The communications unit 310 shown in FIG. 8 provides for communicationswith other data processing systems or devices. In these examples,communications unit 310 is a network interface card. Modems, cable modemand Ethernet cards are just a few of the currently available types ofnetwork interface adapters. Communications unit 310 may providecommunications through the use of either or both physical and wirelesscommunications links.

The input/output unit 312 shown in FIG. 8 enables input and output ofdata with other devices that may be connected to data processing system300. In some embodiments, input/output unit 312 may provide a connectionfor user input through a keyboard and mouse. Further, input/output unit312 may send output to a printer. Display 314 provides a mechanism todisplay information to a user.

Instructions for the operating system and applications or programs arelocated on the persistent storage 308. These instructions may be loadedinto the memory 306 for execution by processor unit 304. The processesof the different embodiments may be performed by processor unit 304using computer implemented instructions, which may be located in amemory, such as memory 306. These instructions are referred to asprogram code, computer usable program code, or computer readable programcode that may be read and executed by a processor in processor unit 304.The program code in the different embodiments may be embodied ondifferent physical or tangible computer readable media, such as memory306 or persistent storage 308.

Program code 316 is located in a functional form on the computerreadable media 318 that is selectively removable and may be loaded ontoor transferred to data processing system 300 for execution by processorunit 304. Program code 316 and computer readable media 318 form acomputer program product 320 in these examples. In one example, thecomputer readable media 318 may be in a tangible form, such as, forexample, an optical or magnetic disc that is inserted or placed into adrive or other device that is part of persistent storage 308 fortransfer onto a storage device, such as a hard drive that is part ofpersistent storage 308. In a tangible form, the computer readable media318 also may take the form of a persistent storage, such as a harddrive, a thumb drive, or a flash memory that is connected to dataprocessing system 300. The tangible form of computer readable media 318is also referred to as computer recordable storage media. In someinstances, computer readable media 318 may not be removable.

Alternatively, the program code 316 may be transferred to dataprocessing system 300 from computer readable media 318 through acommunications link to communications unit 310 and/or through aconnection to input/output unit 312. The communications link and/or theconnection may be physical or wireless in the illustrative examples. Thecomputer readable media also may take the form of non-tangible media,such as communications links or wireless transmissions containing theprogram code.

The different components illustrated for data processing system 300 arenot meant to provide architectural limitations to the manner in whichdifferent embodiments may be implemented. The different illustrativeembodiments may be implemented in a data processing system includingcomponents in addition to or in place of those illustrated for dataprocessing system 300. Other components shown in FIG. 8 can be variedfrom the illustrative examples shown. For example, a storage device indata processing system 300 is any hardware apparatus that may storedata. Memory 306, persistent storage 308, and computer readable media318 are examples of storage devices in a tangible form.

Therefore, as explained at least in connection with FIG. 8 the presentinvention is as well directed to a system for determining the geometryand/or the localisation of an element, a computer program product fordetermining the geometry and/or the localisation of an element and acomputer data carrier.

In accordance with a further embodiment of the present invention isprovided for a computer data carrier storing presentation contentcreated while employing the methods of the present invention.

Although the present invention has been described in more detail inconnection with its embodiment for determining the geometry of a room,the present invention finds applicability of connection with many otherfields.

The present invention can be used for determining the exact position ofa receiver r, which is a person in the FIG. 7. In the case a satellite,e.g. a GPS satellite is the source s of a radio signal which can bereflected by some buildings B1, B2. If the echo e1 is not used, thelocalisation of a mobile device r of a person can be computedincorrectly (the mobile device r will be considered located incorrespondence of {tilde over (r)}).

Knowing the position of the satellite s, the position of the buildingsB1, B2, etc. (this is possible e.g. by using an electronic map) andapplying the method according to the invention, it is possible toaccurately locate the mobile device r and then the person, without anyerror.

An application of the method lies in neurology. Neural activity ismeasured by electrodes introduced into the human or animal brain. Theseelectrodes pick up signals coming from multiple neurons. Neural spikesorting aims at identifying spikes coming from a single neuron: suchidentification is a labeling o clustering problem. For finding itssolution, the method according to the invention can be applied.Clustering is done based on the spike shape and the relative spikeamplitudes at different electrodes.

Since the human or animal tissue is homogeneous and the electric signalsare observed through the line-of-sight propagation, the relative spikeamplitudes depend on the distance between the electrodes and theneurons. The exact amplitude pattern depends on the electrode arraygeometry and on the mutual position of the electrode array and a givenneuron.

In the noiseless case, knowing the characteristics of the propagation inthe human or animal tissue, and having a sufficient number of electrodeswould uniquely identify the location of each given neuron.

In the noisy case, the method according to the invention allows to findthe likely location of each neuron, by finding the closest EDM.

The method of the invention can also be used in audio-forensics. Forexample, a person moving in a room while talking on a phone might enableus to learn the shape of that room based on the audio signal transmittedover the phone channel.

The method according to the invention can also be applied in CDMA, or ingeneral in MIMO communications. A possible application is the accuratechannel estimation. In multipath propagation (for example indoorchannels), the receiving antenna pick up the direct signal, and a numberof echoes or reflections. These reflections, as discussed, can bemodeled by image sources. It is possible then to estimate the EDMcorresponding to multiple emitting and receiving antennas, and theninclude image sources. It is then possible to estimate the locations ofthese image sources, and then find the “perfect” locations of thecorresponding path components in impulse responses.

Furthermore, if the geometry or the position of the antenna arrayschanges, it is likely that the major reflections will still be comingfrom the same reflectors. It is then possible to efficiently re-estimatethe channel by only learning the new geometry of the antenna array.

Advantageously the method according to the invention can be used forboost the signal power, as already attempted by the “RAKE” receivers.However, such receivers try to decide where the individual channel tapsare from the estimated impulse responses. On the contrary with themethod according to the invention after estimating the shape of theroom, it is possible to have a perfect knowledge of the image sourcelocations and this could be used for correctly combining the reflectedsignals in order to boost the power.

The method according to the invention can be applied to ToF (Time ofFly) camera, where a single light pulse illuminates the scene, and thenthe scene depth is computed based on the travel time of light. On thecamera side there is a pixel array where pixels are time-resolvingsensors (or there is a shutter that has the role of time resolving). Themethod according to the invention can allow to substantially reduce thenumber of pixels needed by approximating the scene with a number ofplanar reflectors and finding the image source corresponding the eachplanar reflector by using the EDM.

Another possible application of the method according to the invention isthe indoor sound source localization, usually considered difficult sincethe reflections are difficult to predict and they masquerade as sources.

Another set of applications is in teleconferencing and auralizationwhere one would, perhaps for different reasons, like to compensate theroom influence or create an illusion that the sound is played in aspecific room. This largely consists in compensating the earlyreflections, which in turn requires the knowledge of the reflectorlocations. The listed techniques work because knowing the boundaryconditions allow to compute the RIR for an arbitrary source-receivergeometry inside the room.

A different field of application is in wave field synthesis: knowing thelocations of early reflections might enable to develop more specificindoor sampling theorems.

1. A method for determining the geometry and/or the position of anobject, comprising the steps of sending one or more signals with onetransmitter; receiving by one or more receivers the transmitted signalsand echoes of the transmitted signals reflected by one or morereflective surfaces; building with a computing module a first Euclideandistance matrix corresponding to mutual positions of the receivers;adding to said matrix a new row and a new column, the new row and a newcolumn corresponding to the time of arrivals of at least some of saidechoes, and computing the rank of the modified matrix, or computing thedistance between the modified matrix from a true Euclidean DistanceMatrix; determining the geometry and/or the position of the object basedon the computed information.
 2. The method of claim 1, wherein onlyfirst order echoes are considered.
 3. The method of claim 1, whereinonly echoes received during a predetermined time window are considered.4. The method of claim 1, said object being a convex room, saidtransmitter being a loudspeaker, each receiver being a microphone, saidgeometry being a 2D geometry, the number of receivers being
 3. 5. Themethod of claim 1, said object being a convex room, said transmitterbeing a loudspeaker, each receiver being a microphone, said geometrybeing a 3D geometry, the number of receivers being higher than
 4. 6. Themethod of claim 1, said object being a receiver, said transmitter beinga satellite, said receiver being a mobile device.
 7. The method of claim1, comprising determining the geometry and/or the localisation of anobject comprising the step of labelling echos.
 8. The method of claim 1,comprising determining which of the peaks of the impulse responsereceived by each receiver correspond to which reflective surface.
 9. Themethod of claim 1, comprising verifying if the augmented matrix stillverify the rank property according which a EDM in Rn has a rank at mostn+2, n being an integer and positive number.
 10. The method of claim 9,comprising testing at least some echoes combination and selecting thecombination for which the rank property is satisfied.
 11. The method ofclaim 1, comprising augmenting said EDM matrix by a vector t formed bythe TOA from the transmitter to each receiver.
 12. The method of claim1, comprising determining the location of the transmitter by usingleast-squared distance trilateration.
 13. The method of claim 1,comprising multi-dimensional scaling.
 14. The method of claim 13,comprising applying a s-stress criterion.
 15. A system for determiningthe geometry and/or the localisation of an object, comprising: atransmitter for sending one or more signals; one or more receivers forreceiving the transmitted signals and the echoes of the transmittedsignals as reflected by one or more reflective surfaces; a firstcomputing module for building a first Euclidean Distance Matrix (EDM)corresponding to mutual positions of the receivers; a second computingmodule for adding to the EDM a new row and a new column, the new row anda new column comprising time of arrivals of said echoes and computingits second rank or its distance from the first EDM; a third computingmodule for determining the geometry and/or the position of the objectbased on said second rank or distance.
 16. The system of claim 15, thefirst, second and third modules being the same module.
 17. The system ofclaim 15, the transmitter being a loudspeaker, the receiver being amicrophone, the object being a room comprising said loudspeaker and saidmicrophone.
 18. The system of claim 15, the transmitter being asatellite, the receiver a mobile device, the object being said mobiledevice.
 19. A computer program product, comprising: a tangible computerusable medium including computer usable program code for determining thegeometry and/or the localisation of an object, the computer usableprogram code being used for building a first Euclidean Distance Matrix(EDM) comprising the mutual positions of the receivers; adding to theEDM a new row and a new column, the new row and a new column comprisingtime of arrivals of echoes of the signals transmitted by a transmitteras reflected by one or more reflective surfaces and received by one ormore receivers and computing its rank or its distance to the first EDM;determining the geometry and/or the position of the object based on saidrank or distance.